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Abstract 

Since usage of digital video is wide spread nowadays, quality considerations have become essential, and industry 
demand for video quality measurement is rising. This proposal provides a method of perceptual quality assessment 
in H.264 standard encoder using objective modeling. For this purpose, quality impairments are calculated and a 
model is developed to compute the perceptual video quality metric based on no reference method. Because of 
the shuttle difference between the original video and the encoded video the quality of the encoded picture gets 
degraded, this quality difference is introduced by the encoding process like Intra and Inter prediction. The 
proposed model takes into account of the artifacts introduced by these spatial and temporal activities in the hybrid 
block based coding methods and an objective modeling of these artifacts into subjective quality estimation is 
proposed. The proposed model calculates the objective quality metric using subjective impairments; blockiness, 
blur and jerkiness compared to the existing bitrate only calculation defined in the ITU G 1070 model. The accuracy 
of the proposed perceptual video quality metrics is compared against popular full reference objective methods as 
defined by VQEG. 
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Introduction 

Digital video in the form of various video applications 
such as digital television, internet streaming, digital cin- 
ema, video on demand, video telephony and video confer- 
encing predominantly engages our life. And these video 
and multimedia applications are growing fast. In this huge 
digital video application space, there are various service 
providers offering solutions to end customers. And the 
digital video typically goes through different stages of pro- 
cessing before it reaches to the end user, resulted in video 
quality degradation. So the challenge for these service pro- 
viders is to guarantee an appropriate Quality of Experi- 
ence (QoE) for the end user to avoid churn out. Quality 
assessment for speech has quite a long history and well 
established, there are extensive work going on to extend 
the quality assessment to audio and video. The need for 
an accurate and reliable method of video quality measure- 
ment has become more necessary with the new digital 
video applications and services like mobile TV, streaming 
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video and IPTV. In general, quality measurement has a 
wide range of uses, such as codec evaluation, headend 
quality assurance, in-service network monitoring and end 
equipment quality measurement. 

Quality assessment methods can be divided into object- 
ive and subjective measurement. Objective methodology 
uses mathematical models to depict the behavior of the 
human visual system. Subjective assessment of video qual- 
ity presents a methodology for video quality assessment 
that was received by observers and gives opinion about 
the video that they are viewing. The sum of their opinion 
gives the Mean Opinion Score (MOS), this provides the 
measure of subjective quality assessment. 

Conscious quality monitoring in an in service mode is 
beneficial for the service providers and end users. In 
service quality monitoring techniques required to be 
low computational complexity, high correlation with 
MOS and the ability to use the metric meaningfully. 
Perceptual quality metrics are algorithms designed to 
model the quality of video and predict end user opinion 
objectively. 
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Based on the method of objective metric calculation, 
they are generally classified as follows (Winkler 2009): 

(a) Data Metrics which measure the fidelity of the 
signal without considering the content 
characteristics like Peak signal to noise ratio 
(PSNR) and mean square error (MSE). 

(b) Picture Metrics which process the visual 
information in the video data and account for 
the distortions in the content on perceived video 
quality. 

(c) Bitstream or Packet parameter based Metric for 
compressed video delivered over packet network. 

(d) Hybrid metric which is derived based on the 
combination of above. 

Additionally based on the amount of reference infor- 
mation required, they are classified as follows: 

(a) Full Reference (FR) metrics measures the 
degradation in the test video with respect to a 
reference video. 

(b) No Reference (NR) metric analyze the test video 
without the need for an explicit reference clip. 

(c) Reduced Reference (RR) metric is a tradeoff between 
the FR and NR metric calculation in terms of 
reference information requirement. The comparison 
between the test video and the reference video will 
be based on the extracted information. 

Since video compression schemes required to address 
impairments related to block based prediction on spatial 
and temporal domain, any calculated metric suited for in 
service application should be calculated based on picture 
metrics and no reference model. Because of the advan- 
tage of capturing the cumulative effect of the compres- 
sion on video quality, the picture metric based video 
quality measurement is proposed in this paper and ap- 
plied to H.264 compression (ITU-T 2005) scheme for 
headend quality assurance. Even though the reference 
video is available in the H.264 encoder, only no reference 
model is proposed. Since the ability of this proposed 
scheme should be extended to different in service quality 
monitoring application and also have low computational 
complexity. 

Because the compression standard is block based and 
the problem can be generalized over the transform block 
size, the proposed metric calculation of blockiness, blur 
and the jerkiness are arrived at block level. This can be 
generalized to any block based coding and for different 
size of the transform. We have presented MOS calculation 
based on impairments of blockiness, blur and jerkiness 



where the MOS calculation model which carries the cu- 
mulative effect of all the three metric. The computation of 
these impairment metrics are in accordance with the ITU- 
T P.910 (1999) standard. The correctness and effective- 
ness of these models experimental results are compared 
against the full reference well known quality assessment 
method SSIM. 

The paper is organized as follows. Section II provides 
details about the related work in the proposed research 
area. Section III explains the motivation and proposed 
perceptual video quality model. Section IV outlines the 
design and Section V brief about the performance evalu- 
ation and discussion. Section VI contains the concluding 
remarks. 

Related work 

Among the different quality metrics used to assess the 
video quality, an objective full reference quality metric is 
proposed in Abharana et al. (2009) using natural de- 
crease in entropy of decoded frame due to compression 
and vertical and horizontal artifacts due the blockiness 
effect and apart from that the spatial and temporal 
masking properties of human visual system are com- 
pared against other standard full reference metrics. But 
no reference quality metric has more advantage in terms 
of the computational complexity and the reference avail- 
ability. Even though there are many works (Brandao 
et al. 2009; Arum et al. 2012) experimented for quality 
assessment on the compressed video, there are full refer- 
ence metric as in Eden (2007), proposed a measure of 
picture quality as peak signal to noise ratio (PSNR) 
which is a full reference metric and estimated statisti- 
cally using transform coefficients as no reference metric. 
A revised PSNR no-reference model is presented in 
Brandao and Queluz (2010) that estimate video quality 
using estimated DCT coefficients which are derived 
using Maximum Likelihood techniques. Content spatial- 
temporal activity calculation based on average SAD and 
display format based perceptual MOS calculation model 
is proposed in Joskowicz and Ardao (2010) and the rela- 
tionship between the bit-rate and the MOS is derived. 
But only using the bit-rate is limiting the estimation 
quality of certain video service. In Valenzise et al. (2012), 
proposed an estimation of the pattern of lost macro- 
blocks which produces an accurate estimate of the 
mean-square-error (MSE) distortion introduced by 
channel errors. The results of the proposed method are 
well correlated with the MSE distortion computed in 
full-reference mode, with a linear correlation coefficient 
of 0.9 at frame level. A two part no reference quality 
metric calculation consists of training and test is pro- 
posed in Kawano et al. (2010). In the training phase, they 
calculate the sensitivity from features like blockiness, 
blur and edge business etc. and rank these features using 
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the Principal Component Analysis (PCA) method. In 
Rossholm and Lovstroem (2008), the author try to find 
a linear relationship between quality measurement 
method and media-layer metrics such as quantization 
parameter, bits per frame, frame rate, and mean motion 
vector length. The proposed methods in Ries et al. (2007) 
uses the video quality calculation using parameters such 
as bit rate, zero length motion vectors, mean motion vec- 
tor lengths and motion vector direction. Even though bit 
rate is a key parameter (ITU-T G.1070 2012) for estimat- 
ing the coding distortion, the subjective quality of dif- 
ferent video sequences cannot be correlated well with 
only the bitrate. So this proposed method, uses impair- 
ments such as blockiness, blur and jerkiness intro- 
duced by the spatial and temporal activities to improve 
the estimation accuracy in the encoder for the head end 
quality assurance. 

Proposed perceptual quality estimation model 

The lossy nature of all block based video codecs, com- 
pression introduces video artifacts which are noticeable 
to human visual system. In any application user viewing 
experience, the video quality is an important factor for 
the Quality of Experience (QoE). In order to have the 
QoE defined, quality measurement standardization bod- 
ies are trying to define the MOS as measure and define 
a method of MOS prediction which is reliable and repro- 
ducible. Even though some of the objectives are achieved 
in the existing standards this is being researched to ad- 
dress specific application. The proposed idea in this paper 
is to arrive at a NR metric based perceptual quality assess- 
ment which can be used for continuous monitoring in dif- 
ferent applications. At headend this can be implemented 
as part of the encoder without much complication for the 
in service assessment of quality of delivery. 

Performances of quality assessment methods based on 
references are limited by the quality of the source video 
and the video sequence alignment. No reference (NR) 
based approach is an absolute quality assessment as 
viewed by the user which is more useful in end to end 
performance monitoring scenario. Quality assessment is 
a challenging task when there is no reference. NR method 
provides advantage of in service real time assessment be- 
cause of its low computational complexity. 

The NR metrics for video blockiness, blur and jerkiness 
are calculated and the perceptual quality assessment 
model for the codec for a bitrate is derived in accordance 
with ITU G.1070. For a set of training and test video se- 
quences, the perceptual quality calculation based on the 
proposed assessment model is computed and presented. 
The correctness and effectiveness of this model is 
experimented and compared against a well known full 
reference quality metric SSIM as per the methods pro- 
vided in VQEG. 



Video quality parameter I co ding for an optimal frame 
rate is defined in ITU-T G.1070 as follows. 
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Where Br v is the bitrate and I co din g is coding quality 
artifacts assessment the value of which will vary from 0 
to 4. The perceptual quality metric only for the coding 
based quality impairments and provides the quality 
metric at headend. 
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And v3, v4 and v5 are constants and any change in v4 
impacts the value of MOS greatly, obtaining the value of 
v4 based on the no reference blockiness, blur and jerki- 
ness is considered. Proposed MOS calculations uses v4 
which is a combined scaled distortion indicator as a ef- 
fect of all the three impairments along with the bitrate. 

Design overview 

This section presents the details of the intra frame met- 
rics blockiness and blur and inter frame metric jerkiness 
at block level. Based on these calculated intra frame and 
inter frame metrics, the perceptual quality estimation 
is proposed. The proposed model uses the no reference 
metrics which also provides reduced computation 
complexity. 

Blockiness metric 

The blockiness metric is measure of the visible edges on 
the coded picture block boundary; it is calculated based 
on the Boundary Strength (BS) of the transform block 
boundaries which is part of the encoder standard. The 
amount of blockiness present over a widow of frames is 
accumulated and a normalized blockiness metric (BM) is 
computed based on this amount of blockiness. BS value 
of 4 is high blockiness and BS value of 0 is less blocki- 
ness. For the calculation of amount of blockiness, all the 
block boundaries which have BS equal to 4 for intra 
coded frames and BS equal to 2 for the inter coded 
frames are counted. This count is accumulated over a 
frame and based on this the normalized BM metric is 
calculated and converted to percentage terms. So the 
value of the BM is between 0 to 100. 

Blur metric 

The BLur metric(BL) is defined as loss of energy and 
spatial details reduction on the sharp edges, if a sharp 
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edge has more depth in the edge pixels, then the image 
is considered more blurred. This metric is computed 
using a "Sobel" filter for identifying the sharp edges to 
calculate the localized blur metric in a frame. Once the 
blur regions are identified; and then transform coeffi- 
cients of blocks which fall on the blur region are used 
for the calculation of blur. Based on the weighted count 
of each frequency component across all the blocks under 
computation which is having sufficient number of oc- 
currence compared with low frequency components is 
computed for the blur count and then this value is nor- 
malized to obtain BL. The value of BL will be between 
0 to 100. 

Jerkiness metric 

Slow camera movement or zoomed video sequences are 
exposed to jerkiness artifact. This metric is calculated as 
normalized number of transition between states at mac- 
roblock level. Based on a threshold in mean square error, 
this is been calculated that the macroblock got updated 
or not. The measure of the status of macro block upda- 
tion across a window of frames provides the jerkiness ar- 
tifacts (JR). This is computed as the maximum over time 
f the standard deviation over space of all the frames. 
More motion in adjacent frames will result in more value 
for JR. 

All the above artifacts are computed as part of H.264 
encoder along with the perceptual quality metric calcula- 
tion as mentioned in the proposed model. And the Figure 1 
contemplates a modified block diagram of H.264 en- 
coder where the perceptual quality metric is calculated 
in service. This provides MOS score for the video sequence 
along with PSNR, so user can understand the subjective 
quality of the encoded video. 



In the proposed perceptual quality model, the constant 
v4 is calculated as linear combination of the impairments 
together. So v4 which is the combined scaled distortion 
indicator is expressed as follows 

i/4 = a.BM + b.BL + c.JR (3) 

In equation (3), a, b and c are weighted coefficient. 
These are used to adjust the impact of individual impair- 
ment in the perceptual calculation. These values are de- 
rived by experiments using the training set of videos and 
the results are analyzed for set of video content with dif- 
ferent spatial and temporal activities. The expected re- 
sult of each metrics is computed as per the standard 
P.910 and the training set results with different coeffi- 
cients are experimented for minimum error. 

The computed MOS value as in equation (2) provides 
the measure of subjective quality of the video sequence. 
Because the MOS value has the effect of the video impair- 
ments blockiness, blur and jerkiness, the testing results 
shows that this proposed model has high correlation with 
the standard full reference quality metric SSIM. 

The comparison of the accuracy is based on Pearson 
Correlation Coefficient (PCC) and Root Mean Square 
Error (RMSE) as proposed in VQEG (2003) standard. 

Performance evaluation and discussions 

The proposed quality metric calculation is implemented 
in C language. We have used JM coder for the H.264 
video encoding. The metric calculation is implemented 
as part of JM reference software. The video resolution is 
of standard definition size and encoding is set to three 
different bitrates of 512 kbps, 1 mbps and 2 mbps to de- 
pict the effect of these impairments at encoder. Four 
different standard definitions test videos are used to 
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Figure 1 Block diagram of video coding layer of H.264 encoder with perceptual quality score. 
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train the weighted coefficients and obtained the con- 
stants in Equation (1) and (3) at different bit rate. The 
training video sequences are "mobile and calendar", 
"parkrun", "shields" and "Stockholm" (Training video se- 
quence. http://media.xiph.org/). These training test vec- 
tors have various spatial and temporal complexities in 
nature. Since the constant v4 is only the variable one 
and all others are constant, the perceived quality change 
will be proportional to the v4 change. Different six video 
sequences are taken for test purpose; since the parame- 
ters are trained there is no need for parameter change 
for different kind of videos. 

The computations of MOS for these different video 
sequences are conducted as per the proposed method. 
Figures 2, 3 and 4 explains the quality metric performance 
for 512 kbps, 1 mbps and 2 mbps encoding respectively. 

The combined scaled distortion indicator value is vary- 
ing from 1.03 to 4.37 for different test vectors at 512 kbps. 
And the value of 4.37 for 512 kbps is high compare to 
1 mbps which is 3.18, the most distorted video sequence 
where the test vector has high temporal and spatial com- 
plexity. The value for the same is 2.13 for the 2 mbps en- 
coding. The results shows that for different spatial and 
temporal activities the coding distortion is different apart 
from that for different bitrate the quality distortion indica- 
tor correlates well and these results are compared against 



SSIM full reference quality metrics based on PCC and 
RMSE as proposed in VQEG. 

The average values of these shows that the proposed 
model has high correlation for the quality calculation 
than the well known full reference model SSIM (Wang 
et al. 2004) and shown in Table 1. The PCC value is high 
and the RMSE value is less compared to the SSIM 
model. This shows that the MOS calculated based on the 
video impairments are more correlated to user viewing 
experience than standard full reference methods. 

This proposed model explains the video artifacts mea- 
surements in H.264/AVC coded video related to intra and 
inter compression which clearly shows the correlation of 
the calculation is more based on the video impairments 
method than reference models presented in VQEG. Since 
the proposed method uses the impairments in the video 
and a NR method, when using in the decoder end this can 
capture the combined effect of the encoder, channel. This 
method can use application which cannot get the full 
reference or reduced reference information such as 
broadcasting, IPTV and video telephone etc. Since the 
parameter training needs to be done for different codec 
separately. The work can be extended to compare the 
computation complexity and also to map these impair- 
ments parameter from different channel and bitstream 
information. 




Combined Scaled 
Distortion Indicator 

Figure 3 MOS Vs combined scaled distortion indicator graph 
for 1 mbps. 



Conclusions 

A combined measure of perceived video quality for the 
H.264/AVC compression is proposed using no reference 
model. Metrics were implemented in a C/C++ environ- 
ment as part of JM software of H.264. The objective 
modeling of subjective quality parameters was derived 

Table 1 Performance comparison of proposed model 
with SSIM 

Video quality assessment model PCC RMSE 

Proposed model 0.961 0.312 

SSIM model 0.763 0.571 
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from the defined standard model. The results are ana- 
lyzed for correctness with the actual content quality for 
a given encoding scenario which shows that the values 
are highly correlated to the users viewing experience. 
Also these results are compared against a standard full 
reference model and verified using comparison methods 
as mentioned in VQEG for a set of training and test vec- 
tors. Based on these results, video impairment analysis 
based quality model which is relatively low computa- 
tional requirements compared to full reference method 
was providing better quality indication is evident. 
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